2,665 research outputs found

    Machine learning-guided directed evolution for protein engineering

    Get PDF
    Machine learning (ML)-guided directed evolution is a new paradigm for biological design that enables optimization of complex functions. ML methods use data to predict how sequence maps to function without requiring a detailed model of the underlying physics or biological pathways. To demonstrate ML-guided directed evolution, we introduce the steps required to build ML sequence-function models and use them to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to using ML for protein engineering as well as the current literature and applications of this new engineering paradigm. ML methods accelerate directed evolution by learning from information contained in all measured variants and using that information to select sequences that are likely to be improved. We then provide two case studies that demonstrate the ML-guided directed evolution process. We also look to future opportunities where ML will enable discovery of new protein functions and uncover the relationship between protein sequence and function.Comment: Made significant revisions to focus on aspects most relevant to applying machine learning to speed up directed evolutio

    Differential neuroproteomic and systems biology analysis of spinal cord injury

    Get PDF
    Acute spinal cord injury (SCI) is a devastating condition with many consequences and no known effective treatment. Although it is quite easy to diagnose traumatic SCI, the assessment of injury severity and projection of disease progression or recovery are often challenging, as no consensus biomarkers have been clearly identified. Here rats were subjected to experimental moderate or severe thoracic SCI. At 24h and 7d postinjury, spinal cord segment caudal to injury center versus sham samples was harvested and subjected to differential proteomic analysis. Cationic/anionic-exchange chromatography, followed by 1D polyacrylamide gel electrophoresis, was used to reduce protein complexity. A reverse phase liquid chromatography-tandem mass spectrometry proteomic platform was then utilized to identify proteome changes associated with SCI. Twenty-two and 22 proteins were up-regulated at 24 h and 7 day after SCI, respectively; whereas 19 and 16 proteins are down-regulated at 24 h and 7 day after SCI, respectively, when compared with sham control. A subset of 12 proteins were identified as candidate SCI biomarkers - TF (Transferrin), FASN (Fatty acid synthase), NME1 (Nucleoside diphosphate kinase 1), STMN1 (Stathmin 1), EEF2 (Eukaryotic translation elongation factor 2), CTSD (Cathepsin D), ANXA1 (Annexin A1), ANXA2 (Annexin A2), PGM1 (Phosphoglucomutase 1), PEA15 (Phosphoprotein enriched in astrocytes 15), GOT2 (Glutamic-oxaloacetic transaminase 2), and TPI-1 (Triosephosphate isomerase 1), data are available via ProteomeXchange with identifier PXD003473. In addition, Transferrin, Cathepsin D, and TPI-1 and PEA15 were further verified in rat spinal cord tissue and/or CSF samples after SCI and in human CSF samples from moderate/severe SCI patients. Lastly, a systems biology approach was utilized to determine the critical biochemical pathways and interactome in the pathogenesis of SCI. Thus, SCI candidate biomarkers identified can be used to correlate with disease progression or to identify potential SCI therapeutic targets

    Design-by-analogy: experimental evaluation of a functional analogy search methodology for concept generation improvement

    Get PDF
    Design-by-analogy is a growing field of study and practice, due to its power to augment and extend traditional concept generation methods by expanding the set of generated ideas using similarity relationships from solutions to analogous problems. This paper presents the results of experimentally testing a new method for extracting functional analogies from general data sources, such as patent databases, to assist designers in systematically seeking and identifying analogies. In summary, the approach produces significantly improved results on the novelty of solutions generated and no significant change in the total quantity of solutions generated. Computationally, this design-by-analogy facilitation methodology uses a novel functional vector space representation to quantify the functional similarity between represented design problems and, in this case, patent descriptions of products. The mapping of the patents into the functional analogous words enables the generation of functionally relevant novel ideas that can be customized in various ways. Overall, this approach provides functionally relevant novel sources of design-by-analogy inspiration to designers and design teams.SUTD-MIT International Design Centre (IDC)National Science Foundation (U.S.) (Grant Numbers CMMI-0855326, CMMI-0855510, and CMMI-08552930

    Function Based Design-by-Analogy: A Functional Vector Approach to Analogical Search

    Get PDF
    Design-by-analogy is a powerful approach to augment traditional concept generation methods by expanding the set of generated ideas using similarity relationships from solutions to analogous problems. While the concept of design-by-analogy has been known for some time, few actual methods and tools exist to assist designers in systematically seeking and identifying analogies from general data sources, databases, or repositories, such as patent databases. A new method for extracting functional analogies from data sources has been developed to provide this capability, here based on a functional basis rather than form or conflict descriptions. Building on past research, we utilize a functional vector space model (VSM) to quantify analogous similarity of an idea's functionality. We quantitatively evaluate the functional similarity between represented design problems and, in this case, patent descriptions of products. We also develop document parsing algorithms to reduce text descriptions of the data sources down to the key functions, for use in the functional similarity analysis and functional vector space modeling. To do this, we apply Zipf's law on word count order reduction to reduce the words within the documents down to the applicable functionally critical terms, thus providing a mapping process for function based search. The reduction of a document into functional analogous words enables the matching to novel ideas that are functionally similar, which can be customized various ways. This approach thereby provides relevant sources of design-by-analogy inspiration. As a verification of the approach, two original design problem case studies illustrate the distance range of analogical solutions that can be extracted. This range extends from very near-field, literal solutions to far-field cross-domain analogies.National Science Foundation (U.S.) (Grant CMMI-0855326)National Science Foundation (U.S.) (Grant CMMI-0855510)National Science Foundation (U.S.) (Grant CMMI-0855293)SUTD-MIT International Design Centre (IDC

    Protein structure generation via folding diffusion

    Full text link
    The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a new diffusion-based generative model that designs protein backbone structures via a procedure that mirrors the native folding process. We describe protein backbone structure as a series of consecutive angles capturing the relative orientation of the constituent amino acid residues, and generate new structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins biologically twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release the first open-source codebase and trained models for protein structure diffusion

    Facilitating Design-by-Analogy: Development of a Complete Functional Vocabulary and Functional Vector Approach to Analogical Search

    Get PDF
    Design-by-analogy is an effective approach to innovative concept generation, but can be elusive at times due to the fact that few methods and tools exist to assist designers in systematically seeking and identifying analogies from general data sources, databases, or repositories, such as patent databases. A new method for extracting analogies from data sources has been developed to provide this capability. Building on past research, we utilize a functional vector space model to quantify analogous similarity between a design problem and the data source of potential analogies. We quantitatively evaluate the functional similarity between represented design problems and, in this case, patent descriptions of products. We develop a complete functional vocabulary to map the patent database to applicable functionally critical terms, using document parsing algorithms to reduce text descriptions of the data sources down to the key functions, and applying Zipf’s law on word count order reduction to reduce the words within the documents. The reduction of a document (in this case a patent) into functional analogous words enables the matching to novel ideas that are functionally similar, which can be customized in various ways. This approach thereby provides relevant sources of design-by-analogy inspiration. Although our implementation of the technique focuses on functional descriptions of patents and the mapping of these functions to those of the design problem, resulting in a set of analogies, we believe that this technique is applicable to other analogy data sources as well. As a verification of the approach, an original design problem for an automated window washer illustrates the distance range of analogical solutions that can be extracted, extending from very near-field, literal solutions to far-field cross-domain analogies. Finally, a comparison with a current patent search tool is performed to draw a contrast to the status quo and evaluate the effectiveness of this work.National Science Foundation (U.S.) (grant number CMMI-0855510)National Science Foundation (U.S.) (grant number CMMI-0855326)National Science Foundation (U.S.) (grant number CMMI-0855293)SUTD-MIT International Design Centre (IDC
    • …
    corecore